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ABSTRACT 

The effects of repeated I.Q. testing were 
investigated to ascertain the necessity of constructing and using 
alternate test forms. There were also attempts made to describe 
selected individual characteristics of subjects who improved the most 
over the repeated testing. One hundred and forty-five students were 
tested at one month intervals for three months. Two forms of the 
Otis-Lennon Mental Abilities Test were used in a counter-balanced 
design. The total group improved only from the first to second 
testing session. Persons repeating the same form did significantly 
better than persons taking alternate forms over the same testing 
sessions. It appeared that the students did tend to remember items 
from testing session one to testing session two, but this trend did 
not hold into testing session three. In general, the mean scores 
tended to decrease from testing session two to testing session three. 
Persons who appeared to improve most were from the upper class, or 
girls, or had relatively high grade point averages. (Author) 
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ABSTRACT 



A standardized Intelllgenxe test was administered three times to 
145 sixth-grade students at monthly intervals* A nundber of variables, 
such as social status, sex, grade-point average (GPA), initial IQ test 
score, and test-n/iseness, were used to predict the resulting changes 
in IQ test scores. Overall, the scores increased from the first to second 
testing session, but decreased from the second to third. The types of 
students whose scores increased were girls either from the middle or upper 
class or with a relatively high fifth grade GPA. The conditions under 
which most studies of repeated testing crccur usually inclu de some typ e_of^„ 
motivational technique. In order to obtain more reliable estimates 
of the effects of variables related to standardized testing, it seems 
imperative that some specific testing conditions need to be adopted. 



A review of previous research suggests that test scores on the 
average increase when a standardized instrument is repeatedly administered 
(Kreit, 1968; PMA, 1968; Dearborn and Rothney, 1941; Peel, 1952). Present 
measurement theory (Thomdike in Lindquist, 1951) assumes that much of 
this increase is due to remembering specific test items. Many studies on 
repeated testing (PIIA, 1958; Heim and Wallace, 1949; Kreit, 1968; Droege, 
1966) report results that are similar when either the same form or alternate 
forms of a test are used, which would cast doubt on this assumption. 

A number of studies have been reported which investigate the 
effects of repeated testing with standardized instruments. The majority 
of these studies indicate that scores for individuals fluctuate (Thoulass, 
1936), but the mean score for the group increases significantly from the 
first to second administration of the test (PMA, 1968; Dearborn and Rothney, 
1941). These group gains tend to decrease for each subsequent adminis- 
tration, and are usually found to be non-significant after the second 
testing session (Peel, 1952; Kreit, 1968). In previous experiments either 
all subjects were given the same form, or all subjects were given alternate 
forms of the testing instrument* The relative effect of remembering 
specific test items could not be investigated with these designs. 
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The purpose of the present study was to investigate the assumption 
that remembering specific test items is a major determinant of increases 
in test scores resulting from repeated testing, m addition, an attempt 
was made to identify the types of students who improve most over repeated 
tes ting. 

METHOD 

Saiig>le 

All sixth grade students of a rural school district in southeastern 
Missouri were selected for the study. The school qualified for Title I 
funds with over 50Z of the students' families receiving some form of 
welfare aid. 

Procedure 

All students were given identifying numbers which were selected 
randomly from a bowl to make up six groups, each composed of approximately 
28 students. Tlie six groups were administered two forms of the Otis- 
Lennon Mental Ability Test as indicated in Table 1. since menders of each 
group came from each of six classrooms tested, the effects of testing 
conditions and test administration differences were minimized. 

As the design indicates, three of the groups took Form J first and 
three groups took Form K first. The design was balanced so the effects 
of Forms J and K were taken into account. By administering the sane form 
to some students each time, and alternate forms to other students, the 
effect of remembering specific test items could be investigated. 
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Table 1 

Procedure for the Administration of the Different Forms 
of the Otls-Lennon Mental Ability Test 



Groiq> 


Form Taken by Testine Session 


TS 1 


TS 2 


TS 3 


1 


J 


J 


J 


2 


J 


J 


K 


3 


J 


K 


J 


4 


K 


K 


K 


5 


K 


K 


J 


6 


K 


J 


K 



The Index of Social Status (ISS), developed by McGuire and White 
(1955), was used to confute the social status of each student. The ISS 
provides a score which is a weighted sum of ratings on the occupation, 
source of income, and education of the students' parents. The scores 
can rang J from 12 (high) to 84 (low). 

Tae Otis-Lennon Mental Ability Test is composed of 80 multiple 
choice items to be coin)leted in 40 minutes. General purpose answer sheets 
were used with the test booklets. The test was administered at monthly 
intervals (March 18, April 15, and May 13). 

RESULTS 

To investigate the primary question concerning the effect of 
remembering specific test items, both the general, or overall, practice 
effect and the differential effect of the same form and alternate forms 
of the test were analyzed. Raw scores were used as the criterion scores. 

O 
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General Practice Effect 

There was a significant increase from the first to second testing 
session, but not from the second to tliird nor from the first to third 
testing session as indicated in Table 2. This result agrees with previous 
results, suggesting that a practice effect occurs quickly and then seems 
to disappear. 

Table 2 

Results of Three IQ Testing Sessions with all Subjects 









Mean Difference 


Scores 


Testing Session 


Mean Score 


TS2-TS1 TS3-TS2 


TS3-TS1 


1 
2 
3 


43.76 
46.59 
44.50 


2.83* 

-2.10 


.74 



Significant at .01 level using correlated t-tests (one-tailed tests, 
df » 144) 



Practice Effect of Different vs. Same Form of Test 

As indicated in Table 3, students who took the same form during 
the first two administrations increased significantly more than students 
who took different forms. Multiple linear regression procedures (Kelly, 
Beggs, McNeil, Eichelberger, and Lyon, 1969) were, used in doing the 
statistical analyses. To test the effect of differences in test form the 
following models were used. 



ERLC 



5 



Y - agU + + ^^^^^ 

Y ■ Gain scores on Otls-Lennon Mental Ability Tests. 

• 1 if person took the same form of the test on both testing 
sessions in question , 0 otherwise. 

« 1 if person took different forms of the test on the two 
testing sessions in question, 0 otherwise. 

U ■ Unit vector 

and a2 ^ Least-square weights 

- Error vector (Y - 1?) 

The hypothesis that the two groups were from the sane population 
was tested by assuming a^ » a2« The resulting equation in this case was: 

Y - a^U + 

Tlie proportion of criterion variance accounted for by the predictor variables 
in each case were compared. The results of these tests are reported in 
Table 3. 

Table 3 

Mean Gain Scores for Students Taking the Same Form vs. Students Taking 
Alternate Forms of IQ Test During the TWo Testing Periods 



Testing Period 


Same Form 
(N-99) 


Different Forms 
(N-46) 


F 


^^1 


df, 


TS2-TS1 


4.05 


.15 


13.2* 


1 


143 


TS3-TS2 


-3.35 


-1.42 


1.8 


1 


143 


TS3-TS1 


.75 


.71 


.01 


1 


143 


* 

Significant at .01 level (one-tailed). 
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Aaalysls of Response Pat turns to the Items 

The response patterns of all students who took the same test 
form during the first two testing sessions (TSl and TS2), which was the 
only time a significant change occurred, v/ere analyzed in the following 
manner. Since nearly all students answered all 80 items during both testing 
sessions, only the following four conibinations were studied: (1) answe^^4 — 
correctly both times (-H-), (2) answered incorrectly both times ( — )» 
(3) answered correctly during TSl and incorrectly during TS2 (+-)f and (4) 
answered incorrectly during TSl and correctly during TS2 (-+)• The 
results are broken down in Table 4 to high social status (USS) and low 
social status (LSS) for both students whose scores increased on TS2 and 
those whose scores did not. Because the students tended to have low 
social status, the dividing poir'^ for high and low social status was 
arbitrarily set between scores of 62 and 63 on the ISS. McGuire and 
Uhite (1953) report that these scores are indicative of lower middle 
class status. 

Table 4 

Mean Item Responses for Increasing and Non- Increasing Subjects 
of High and Low Social Status. 



Testing Increasing Non-Increasing 



Period 
TSl TS2 


HSS 
(il=27) 


LSS ' 
(N«49) 


liSS 
(N«8) 


LSS 
(N»15) 


Response 

Comb inat ions «f -f 


49.5 


31.6 


27.5 


25.6 


+ 


5.7 


7.2 


13.5 


12.8 


+ 


10.8 


11.7 


10.5 


10.4 




11.9 


23.3 


27.6 


29.2 
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As indicated in Table 4» the HSS increasing students (12-62) 
averaged 49*5 -H- items while LSS (63-84) students in this group averaged 
31.6. On ~ items HSS subjects averaged 11.92, while LSS subjects averaged 
23.3. In both cases the increasing students with LSS did not score much 
differently from the non-increasing students. The overall differences 
"between the increasing and non-increasing groups appear Xcrire, -due primarily 
to increasing students with HSS. 

Summary of Results Related to Remembering;^ Specific Test Items 

These results suggest that a practice effect occurs rapidly and 
then dissipates. The practice effect does £^pear to be due to students 
remembering specific test items, as students repeating the same form 
iBq>roved 4.05 items while students taking different forms improved on 
the average of only .15 items. Therefore, the assumption does appear to 
be supported. 

At least two observations require tempering of this conclusion. 
First, there did not £^pear to be an adequate ceiling on the test used. 
One student answered all 80 items correctly during one testing session 
and 79 items during a second session. A number of students answered 
over 70 items correctly during all testing sessions. Thus, there was 
little opportunity for other test-taking skills, such as use of time or 
use of answer sheet, to be indicated as related to in^rovement. Second, 
nearly all previous studies indicate continuous improvement— even over 
9 or 10 administrations (Heim and Wallace, 1949), while a decrease occurred 
from TS2 to TS3 in this study. 

A possible explanation for these results, especially as they 
differ in many ways from previous results, is a motivational one based 
Q on the situation in which the tests were given. Almost no previous study 

ERIC 
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reported a decrease in average score from the second to third testing 
session. In nearly all previous studies some attenpt was made to motivate 
the subjects to improve (e.g., volunteer subjects were used, rewards were 
given, directions were read which indicated improvement was expected, etc.). 
In the present study no such attempts were made. The students were told 
that the LcurTiiiun i "yiil^ ^iwn tn th e school admini s.trators, but 
no indication was given either that the scores would count toward their 
grades, or that they would be retested. The person administering the 
tests simply indicated that he was interested in finding out what would 
happen to the scores, if he said anything at all. Only after the final 
administration were the children told that the testing was over. The 
teachers and administrators of the School System gave tremendous cooperation 
and there appeared to be no hostility toward either the experimenter or 
the time taken from class* The students also left many more blank spaces 
on the third test than on the second one. During TS3 a number of students 
completed the first row on the answer sheet and then stopped. The same 
format had been used twice previously, so they knew how to carry out the 
task. Therefore, it appears that many students simply became bored with 
the task on the third testing (three IQ tests in two months). Other 
studies that follow must be concerned with this possibility. 

Identifying Students Improving Most on the Tests 

Six predictor variables were used to describe the participating 
sixth grade students. These variables were: (1) social status (ISS), 
(2) initial IQ test score, (3) fifth-grade grade point average (CPA), 
(4) sex, (5) test-wiseness, and (6) a moderator variable, (GPA/IQ)* ISS. 
The measurement of each variable has previously been described except for 
test-wiseness. This skill was measured by a 16-item instrument originally 
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devised by Slakter and Koehler (1969). The rationale for including these 
particular variables is indicated elsewhere (Eichelberger, 1970). 

The independent contribution of each variable to the change in 
test score variance was investigated from TSl to TS2, TS2 to TS3, and TSl 
to TS3. The Multiple Linear Regression approach was used in the following 
manner to analyze the data* 

Y = agU + a^X^ + a2X^2 ^3^3 ^4^4 ^6^6 ^3* ^^^^^ 

Y Change in test score 
Xj^ » ISS score 

a Initial IQ test score 

X^ - Fifth-grade GPA 

X^ » Sex 

X^ s Tes t-wiseness score 

Xg » Moderator variable (X^/X^)* X^^) 

a^ through a^ =* Least-square weights 

Error vector (Y - ?) 

Each predictor variable was dropped in turn from the equation to test its 
independent contribution to the change in test scores during the different 
testing sessions. 

RESULTS 

The results indicated that GPA was the only significant (a<.05) 
predictor from TSl to TS2, while sex and social status (ISS) were signif- 
icant predictors from TS2 to TS3 and from TSl to TS3, The proportion of 
criterion variance accounted for by the predictor set in each case is indicated 
ERIC in Table 5. 
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Table 5 

Results of Predicting Test Score Change by 6 Predictor Variables 









Obtained 


Testing Period 




• 

Significant Predictors 


P 


df 


_TS2^S1 


.0656 


GPA 


.013 


1,138 






SEX 


.003 


1,138 


TS3-Ti>2 


.1187 


ISS 


.037 


1.138 






SEX 


.001 


1,138 


TS3-TS1 


.1447 


ISS 


.008 


1,138 


Note: Significance level: 


a<.05, M - 145 







Although only a small proportion of the change in test score was 
predicted, some inferences concerning the types of students most likely 
to improve on repeated testing are possible. Further observation and 
nanipulation of the data Indicated that girls with middle or upper status » 
or with a relatively high GPA were most likely to iiq>rove on repeated 
IQ testing. 

Again, these results may be peculiar to the situation in which 
this study was done. Perhaps upper class girls are more willing to per- 
severe when given an apparently boring task. Also, these results might 
not replicate when periods between testing sessions are more like that which 
normally occurs within a school, i.e., a full year. But numerous theories 
would lead one to eaqpect that sixth-grade girls from middle or upper 
class families, or with a history of high academic achievement, would be 
most likely to concentrate more and work harder to score well on standardized 
tests given within their schools. Therefore, the results from this study 
would tend to support these ideas, or theories* 
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SUMMARY 

In sununary, the data reported tend to support the assunqptlon that 
students remember specific test items — at least for a period of one month. 
Other test-taking skills did not appear to be significant predictors of 
test score change* Persons who tended to improve mo& . <t * >ted testing 
were girls, students from the upper class, or students with relatively 
high CPA's. Motivational effects on repeated testing appeared to be 
especially detrimental to an in-depth analysis of the two research con- 
cerns — remembering specific test items, and identifying students whose 
scores increased. 

Further studies on repeated testing should attempt to standardize 
the conditions under which tests are given, while making sure that students 
are adequately motivated to do their best during each testing sesr^ion. 
It would appear that only when a nuraber of different researchers attetn^t 
to work under relatively standard conditions will the effects of variables 
related to changes in standardized test scores be adequately evaluated. 
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